2 (Deep) Neural Networks
2.1 What is a Deep Neural Network?
2.1.1 Remark
Key Ideas:
1) Humans are good at detecting patterns
2) The brain solves a varietty of tasks universally
3) Evolution has already perfected the contruction
2.1.2 Remark (How to build a neuron?)
If the sum of input signals into a neuron surpasses a threshold, the neuron sends a signal.
2.1.3 Definition
An artificial neuron with weights \( \omega_1 ,…, \omega_n \in \mathbb{R} \) and a bias \( b \in \mathbb{R} \) and an activation function (rectifier) \( \sigma: \mathbb{R} \to \mathbb{R} \) is defined as the function \( f: \mathbb{R}^n \to \mathbb{R} \) given by
$$ f(x_1,…,x_n) = \sigma \left( \sum_{i=1}^{n} x_i \omega_i - b \right) \\ = \sigma ( \langle x, \omega \rangle - b ) $$
2.1.4 Examples of activation functions
1) Heaviside function
$$ \sigma(x) = \begin{cases} 1 : x > 0 \\ 0 : x \leq 0 \end{cases} $$
2) Sigmoidal function
$$ \sigma(x) = \frac{1}{1+e^{-x}} $$
3) Rectifiable Linear Unit (ReLU)
$$ \sigma(x) = max \lbrace 0, x \rbrace $$
4) Softmax function
$$ \sigma(x) = ln \left( 1 + e^{x} \right) $$
2.1.5 Remark and Definition
An artificial neural network is a graph which consists of artificial neurons. Special case: A feed forward neural network is a directed acyclic graph of artificial neurons. Other neural networks are called recurrent neural networks.
2.1.6 Definition
Let \( d \in \mathbb{N} \) be the input dimension, \( L \in \mathbb{N} \) the number of layers, \( N_0 = d, N_1, …, N_L \in \mathbb{N} \) the number of neurons in each layer, \( A_l \in \mathbb{R}^{N_l \times N_{l-1}}, l=1…L \) be the weights, \( b_l \in \mathbb{R}^{N_l}, l=1…L \) be the biases, \( \sigma: \mathbb{R} \to \mathbb{R} \) be the activation function. Then
$$ \Phi = \left( \left( A_l, b_l \right) \right)_{l=1}^L $$
is called (the architecture of) a neural network. And the map
$$ \mathcal{R}_{\sigma} \left( \Phi \right): \mathbb{R}^{d} \to \mathbb{R}^{N_l}, $$
$$ \mathcal{R}_{\sigma} \left( \Phi \right)(x) = x_l $$
with
$$ \begin{aligned} x_0 :&= x, \\ x_l :&= \sigma(A_l \cdot x_{l-1}-b_l), l=1…L-1, \\ x_L :&= A_L \cdot x{L-1} - b_L \end{aligned} $$
is called the realization of the neural network with activation function \( \sigma \).